H-CLAP: hierarchical clustering within a linear array with an application in genetics.

نویسندگان

  • Samiran Ghosh
  • Jeffrey P Townsend
چکیده

In most cases where clustering of data is desirable, the underlying data distribution to be clustered is unconstrained. However clustering of site types in a discretely structured linear array, as is often desired in studies of linear sequences such as DNA, RNA or proteins, represents a problem where data points are not necessarily exchangeable and are directionally constrained within the array. Each position in the linear array is fixed, and could be either "marked" (i.e., of interest such as polymorphic or substitute sites) or "non-marked." Here we describe a method for clustering of those marked sites. Since the cluster-generating process is constrained by discrete locality inside such an array, traditional clustering methods need adjustment to be appropriate. We develop a hierarchical Bayesian approach. We adopt a Markov clustering algorithm, revealing any natural partitioning in the pattern of marked sites. The resulting recursive partitioning and clustering algorithm is named hierarchical clustering in a linear array (H-CLAP). It employs domain-specific directional constraints directly in the likelihood construction. Our method, being fully Bayesian, is more flexible in cluster discovery compared to a standard agglomerative hierarchical clustering algorithm. It not only provides hierarchical clustering, but also cluster boundaries, which may have their own biological significance. We have tested the efficacy of our method on data sets, including two biological and several simulated ones.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

A Thinning Method of Linear And Planar Array Antennas To Reduce SLL of Radiation Pattern By GWO And ICA Algorithms

In the recent years, the optimization techniques using evolutionary algorithms have been widely used to solve electromagnetic problems. These algorithms use thinning the antenna arrays with the aim of reducing the complexity and thus achieving the optimal solution and decreasing the side lobe level. To obtain the optimal solution, thinning is performed by removing some elements in an array thro...

متن کامل

Choosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation

1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...

متن کامل

Application of 3D-QSAR on a Series of Potent P38-MAP Kinase Inhibitors

One of the most applied methods in drug industry for development of new drugs is 3D-QSAR methodology. As p38-mitogen-activated protein kinase (p38-MAPK) plays a crucial role in regulating the production of such proinflammatory cytokines as tumor necrosis factor-α (TNF-α) and interleukin-1, emerging as an attractive target for new anti-inflammatory agents, we used a 3D-QSAR based method of Compa...

متن کامل

Development a New Technique Based on Least Square Method to Synthesize the Pattern of Equally Space Linear Arrays

Using the sampled data of a desired pattern is a common technique in pattern synthesizing of array factor (AF) of antenna arrays. Based on the obtained data matrix, Least Square Method (LSM) is used to calculate the exciting coefficient of array elements. The most important parameter, which involves the accuracy and complexity of calculation, is the sampling rate of the desired pattern. Classic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistical applications in genetics and molecular biology

دوره 14 2  شماره 

صفحات  -

تاریخ انتشار 2015